Global Deaths Due to Air Pollution

Elizabeth Bekele, Alison Cheek

2022-05-03

Introduction

Packages Required

#This will allow us to filter through our data 
library(tidyverse)
library(dplyr)
#This will help us plot figures to showcase our findings
library(ggplot2)
#This will help us organize and display our data as necessary 
library(knitr)
library(kableExtra)
#This expands our plot uses 
library(plotly)
#Scientific Notation Disabled 
options(scipen=999)

Pollution Data

Import the deaths-due-to-air-pollution data

deaths_df <- data.frame(read.csv("death-rates-from-air-pollution.csv"))

We are going to rename a few of the columns and glimpse the data

colnames(deaths_df) <- c("country", "acronym", "year", "total_deaths", "indoor_deaths", "outdoor_deaths", "ozone_deaths")

glimpse(deaths_df)
## Rows: 6,468
## Columns: 7
## $ country        <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist…
## $ acronym        <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",…
## $ year           <int> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1…
## $ total_deaths   <dbl> 299.4773, 291.2780, 278.9631, 278.7908, 287.1629, 288.0…
## $ indoor_deaths  <dbl> 250.3629, 242.5751, 232.0439, 231.6481, 238.8372, 239.9…
## $ outdoor_deaths <dbl> 46.44659, 46.03384, 44.24377, 44.44015, 45.59433, 45.36…
## $ ozone_deaths   <dbl> 5.616442, 5.603960, 5.611822, 5.655266, 5.718922, 5.739…

Data Variables

Variables that interest us here include:

World Population Data

Now, let’s take a look at the population data.

world_pop <- read.csv("population_total_long.csv")
glimpse(world_pop)
## Rows: 12,595
## Columns: 3
## $ Country.Name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra", "…
## $ Year         <int> 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 196…
## $ Count        <int> 54211, 8996973, 5454933, 1608800, 13411, 92418, 20481779,…

To get a general idea of ‘deaths-dataframe’ we made, let’s make a plots to see what’s happening. This is a plot of indoor x outdoor deaths around the world by country.

This is a mess, and so we chose two countries from each continent (a high-population and a low-population country) to graph.

We selected a high population from each continent and used the formula below to determine the low population.

Low population = high population * .10

## Rows: 126
## Columns: 3
## Groups: Year [21]
## $ Country.Name <chr> "Australia", "Brazil", "Germany", "Nigeria", "Pakistan", …
## $ Year         <int> 1997, 1997, 1997, 1997, 1997, 1997, 1998, 1998, 1998, 199…
## $ Count        <int> 18517000, 167209040, 82034771, 113457663, 131057431, 2726…
## Rows: 126
## Columns: 3
## Groups: Year [21]
## $ Country.Name <chr> "Canada", "Chile", "Sri Lanka", "Malawi", "New Zealand", …
## $ Year         <int> 1997, 1997, 1997, 1997, 1997, 1997, 1998, 1998, 1998, 199…
## $ Count        <int> 29905948, 14786220, 18470900, 10264906, 3781300, 7596501,…

Combine Data Sets

First let’s look at a table of the high and low populated countries using the world population data set.

## # A tibble: 6 × 3
## # Groups:   Year [1]
##   Country.Name   Year     Count
##   <chr>         <int>     <int>
## 1 Australia      1997  18517000
## 2 Brazil         1997 167209040
## 3 Germany        1997  82034771
## 4 Nigeria        1997 113457663
## 5 Pakistan       1997 131057431
## 6 United States  1997 272657000
## # A tibble: 6 × 3
## # Groups:   Year [1]
##   Country.Name  Year    Count
##   <chr>        <int>    <int>
## 1 Canada        1997 29905948
## 2 Chile         1997 14786220
## 3 Sri Lanka     1997 18470900
## 4 Malawi        1997 10264906
## 5 New Zealand   1997  3781300
## 6 Serbia        1997  7596501

Next, we are going to see the death count for high and low populated countries using the deaths dataframe.

## # A tibble: 6 × 7
## # Groups:   year [6]
##   country   acronym  year total_deaths indoor_deaths outdoor_deaths ozone_deaths
##   <chr>     <chr>   <int>        <dbl>         <dbl>          <dbl>        <dbl>
## 1 Australia AUS      1997         22.4         0.322           21.8        0.314
## 2 Australia AUS      1998         21.5         0.284           21.0        0.305
## 3 Australia AUS      1999         20.4         0.259           19.9        0.295
## 4 Australia AUS      2000         19.4         0.240           18.9        0.290
## 5 Australia AUS      2001         18.6         0.223           18.1        0.284
## 6 Australia AUS      2002         18.1         0.211           17.7        0.286
## # A tibble: 6 × 7
## # Groups:   year [6]
##   country acronym  year total_deaths indoor_deaths outdoor_deaths ozone_deaths
##   <chr>   <chr>   <int>        <dbl>         <dbl>          <dbl>        <dbl>
## 1 Canada  CAN      1997         21.9        0.0878           19.9         2.20
## 2 Canada  CAN      1998         21.7        0.0824           19.6         2.21
## 3 Canada  CAN      1999         21.2        0.0751           19.2         2.19
## 4 Canada  CAN      2000         20.3        0.0682           18.3         2.13
## 5 Canada  CAN      2001         19.8        0.0641           17.9         2.08
## 6 Canada  CAN      2002         19.5        0.0605           17.7         2.05

Lastly, we will join the population and and deaths with its respected country.

## # A tibble: 6 × 8
## # Groups:   year [6]
##   country   acronym  year total_deaths indoor_deaths outdoor_deaths ozone_deaths
##   <chr>     <chr>   <int>        <dbl>         <dbl>          <dbl>        <dbl>
## 1 Australia AUS      1997         22.4         0.322           21.8        0.314
## 2 Australia AUS      1998         21.5         0.284           21.0        0.305
## 3 Australia AUS      1999         20.4         0.259           19.9        0.295
## 4 Australia AUS      2000         19.4         0.240           18.9        0.290
## 5 Australia AUS      2001         18.6         0.223           18.1        0.284
## 6 Australia AUS      2002         18.1         0.211           17.7        0.286
## # … with 1 more variable: Count <int>
## # A tibble: 6 × 8
## # Groups:   year [6]
##   country acronym  year total_deaths indoor_deaths outdoor_deaths ozone_deaths
##   <chr>   <chr>   <int>        <dbl>         <dbl>          <dbl>        <dbl>
## 1 Canada  CAN      1997         21.9        0.0878           19.9         2.20
## 2 Canada  CAN      1998         21.7        0.0824           19.6         2.21
## 3 Canada  CAN      1999         21.2        0.0751           19.2         2.19
## 4 Canada  CAN      2000         20.3        0.0682           18.3         2.13
## 5 Canada  CAN      2001         19.8        0.0641           17.9         2.08
## 6 Canada  CAN      2002         19.5        0.0605           17.7         2.05
## # … with 1 more variable: Count <int>

Death Count

Which country has the highest death count?

Let’s make a table depicting the high and low populated countries and their respected death count due to pollution.

country average_death_high
Australia 17.76815
Brazil 48.42928
Germany 28.10988
Nigeria 112.30157
Pakistan 144.33463
United States 26.35827
country average_death_low
Canada 18.18542
Chile 36.51321
Malawi 147.77167
New Zealand 15.92536
Serbia 80.66558
Sri Lanka 69.60383

Here’s a graph to clearly visualize the previous table

So we’ve looked at the deaths due to pollution, but what percentage of the population was affected?

Country.Name average_population
Australia 21217772
Brazil 189132292
Germany 81914540
Nigeria 148549958
Pakistan 168525322
United States 300447600
Country.Name average_population
Canada 33029774
Chile 16555805
Malawi 13605376
New Zealand 4214995
Serbia 7345882
Sri Lanka 19824652

Pollution Types

Which type of pollution has the greatest number of deaths?

## # A tibble: 6 × 4
##   country       avg_indoor avg_outdoor avg_ozone
##   <chr>              <dbl>       <dbl>     <dbl>
## 1 Australia          0.249        17.2     0.360
## 2 Brazil            19.4          26.8     2.74 
## 3 Germany            0.717        25.5     2.34 
## 4 Nigeria           75.9          35.2     2.12 
## 5 Pakistan          87.7          50.5    10.4  
## 6 United States      0.166        22.8     3.92
## # A tibble: 6 × 4
##   country     avg_indoor avg_outdoor avg_ozone
##   <chr>            <dbl>       <dbl>     <dbl>
## 1 Canada          0.0651        16.4    1.97  
## 2 Chile           8.69          27.2    0.850 
## 3 Malawi        132.            13.8    3.39  
## 4 New Zealand     0.291         15.6    0.0728
## 5 Serbia         35.9           42.7    2.94  
## 6 Sri Lanka      44.5           24.8    0.430

Pollution Over Time

Let’s look at the previous two decades and compare the death count Has there been a change?

This is the first decade 1996-2006
country High_Deaths_96 High_Deaths_01 High_Deaths_06
Australia 23.04465 18.58572 14.92239
Brazil 60.67757 49.46436 41.46829
Germany 34.72325 28.38756 23.83654
Nigeria 136.08978 123.05129 102.26653
Pakistan 155.42988 151.25352 146.09296
United States 29.99271 28.93114 25.93369
country Low_Deaths_96 Low_Deaths_01 Low_Deaths_06
Australia 22.18101 19.82451 14.92239
Brazil 46.36829 37.43188 41.46829
Germany 183.14179 165.41702 23.83654
Nigeria 93.44700 83.18333 102.26653
Pakistan 85.28997 72.16239 146.09296
United States 100.66078 95.27073 25.93369
This is the second decade 2007-2017
country High_Deaths_07 High_Deaths_12 High_Deaths_17
Australia 14.92140 12.65973 10.79595
Brazil 40.42460 35.39069 30.32108
Germany 23.45850 20.91536 19.82826
Nigeria 98.90306 84.22324 81.22147
Pakistan 143.81724 133.93887 123.21548
United States 25.11756 21.98194 18.82515
country Low_Deaths_07 Low_Deaths_12 Low_Deaths_17
Canada 16.93196 13.82968 10.71662
Chile 30.53130 27.31475 24.29921
Malawi 132.12253 116.27470 104.93508
Serbia 76.65752 72.77354 62.57853
Sri Lanka 66.05987 59.22433 38.46264
Tonga 87.81178 79.49336 70.72940

Let’s graph the previous tables!

The first decade.

This shows the second decade.

Which year had the worst indoor? Outdoor particulate? Outdoor ozone?

Indoor Deaths

Outdoor Deaths

Ozone Deaths

Which is worse?

outdoor or indoor pollution?

Let’s reintroduce a graph we looked at earlier. Instead this time we will combine the pollutant types together.

We cannot conclude which is worse.

We have this included already

#Mean total deaths from 1996-2017 of high-population countries
deaths_highpop_countries <- deaths_df %>% 
  filter(country %in% c('United States', 'Brazil', 'Nigeria', 'Germany', 'Pakistan', 'Australia')) %>% 
  group_by(country) %>% 
  select(total_deaths) %>% 
  summarize(average_death_high = mean(total_deaths))
## Adding missing grouping variables: `country`
#Mean total deaths from 1990-2017 of high-population countries
deaths_lowpop_countries<- deaths_df %>% 
  filter(year> 1995 & country %in% c('Canada', 'Chile', 'Malawi', 'Serbia', 'Sri Lanka', 'New Zealand')) %>% 
  group_by(country) %>% 
  select(total_deaths) %>% 
  summarize(average_death_low = mean(total_deaths))
## Adding missing grouping variables: `country`
#death_lowpop_countries
kable(list(deaths_highpop_countries, deaths_lowpop_countries))
country average_death_high
Australia 17.76815
Brazil 48.42928
Germany 28.10988
Nigeria 112.30157
Pakistan 144.33463
United States 26.35827
country average_death_low
Canada 16.86963
Chile 32.58415
Malawi 140.50830
New Zealand 14.08771
Serbia 78.12194
Sri Lanka 65.51438
ggplot(deaths_highpop_countries)+
  geom_col(mapping = aes(x=country, y=average_death_high))+
             xlab("Country")+
             ylab("Average deaths (per 100,000)")+
             ggtitle("Average total deaths in high-population countries")+
  coord_flip()

ggplot(deaths_lowpop_countries)+
  geom_col(mapping = aes(x=country, y=average_death_low))+
             xlab("Country")+
             ylab("Average deaths (per 100,000)")+
             ggtitle("Average total deaths in low-population countries")+
  coord_flip()

Summary

Sources